feat(hpc): LazyLock frozen SIMD dispatch table — detect once, keep CPU choice forever#38
Merged
Merged
Conversation
…U choice forever
simd_dispatch.rs (300+ lines, 7 tests):
SimdDispatch: struct of function pointers, frozen at first access via LazyLock.
Each field is a fn pointer to the best available implementation for this CPU.
After initialization: one pointer deref + one indirect call. Zero branching.
SimdTier enum: Avx512 / Avx2 / Sse2 / Scalar / WasmSimd128 (future).
Selected once based on simd_caps() detection. Frozen forever.
Before: if simd_caps().avx512f { avx512_fn() } else { scalar_fn() } → ~1ns + branch
After: (SIMD_DISPATCH.fn_ptr)(args) → ~0.3ns, no branch
Dispatch targets (6 free functions across 4 modules):
byte_scan: byte_find_all, byte_count (AVX-512 / AVX2 / scalar)
distance: squared_distances_f32 (AVX2 / scalar)
nibble: nibble_unpack, nibble_above_threshold (AVX2 / scalar)
spatial_hash: batch_sq_dist (AVX2 / scalar)
NOTE: aabb.rs and cam_pq.rs dispatch on &self methods (not free functions)
so they keep inline simd_caps() branching. The dispatch table covers
the free function hot paths.
Visibility: internal SIMD functions promoted from pub(super)/private
to pub(crate) so the dispatch table can reference them as fn pointers.
The 8 existing per-call dispatch sites in nibble/byte_scan/distance/
spatial_hash/aabb/cam_pq still work — the dispatch table is additive.
Consumers can migrate to simd_dispatch().fn_ptr() incrementally.
TODO (separate PR): Rust 1.94 stabilized safe #[target_feature] on
safe functions. The `unsafe` on SIMD functions is legacy debt that
should be removed. The dispatch wrappers currently bridge this with
SAFETY comments; once unsafe is removed, the wrappers simplify to
direct function pointer assignment.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(hpc): LazyLock frozen SIMD dispatch table — detect once, keep CPU choice forever
simd_dispatch.rs (300+ lines, 7 tests):
SimdDispatch: struct of function pointers, frozen at first access via LazyLock.
Each field is a fn pointer to the best available implementation for this CPU.
After initialization: one pointer deref + one indirect call. Zero branching.
SimdTier enum: Avx512 / Avx2 / Sse2 / Scalar / WasmSimd128 (future).
Selected once based on simd_caps() detection. Frozen forever.
Before: if simd_caps().avx512f { avx512_fn() } else { scalar_fn() } → ~1ns + branch
After: (SIMD_DISPATCH.fn_ptr)(args) → ~0.3ns, no branch
Dispatch targets (6 free functions across 4 modules):
byte_scan: byte_find_all, byte_count (AVX-512 / AVX2 / scalar)
distance: squared_distances_f32 (AVX2 / scalar)
nibble: nibble_unpack, nibble_above_threshold (AVX2 / scalar)
spatial_hash: batch_sq_dist (AVX2 / scalar)
NOTE: aabb.rs and cam_pq.rs dispatch on &self methods (not free functions)
so they keep inline simd_caps() branching. The dispatch table covers
the free function hot paths.
Visibility: internal SIMD functions promoted from pub(super)/private
to pub(crate) so the dispatch table can reference them as fn pointers.
The 8 existing per-call dispatch sites in nibble/byte_scan/distance/
spatial_hash/aabb/cam_pq still work — the dispatch table is additive.
Consumers can migrate to simd_dispatch().fn_ptr() incrementally.
TODO (separate PR): Rust 1.94 stabilized safe #[target_feature] on
safe functions. The
unsafeon SIMD functions is legacy debt thatshould be removed. The dispatch wrappers currently bridge this with
SAFETY comments; once unsafe is removed, the wrappers simplify to
direct function pointer assignment.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7